Framework For Consistent Speech Databases
نویسندگان
چکیده
The introduced speech processing framework creates phonetically and prosodically annotated speech databases. It provides with structured data files in eXtensible Markup Language format. Those files include all available information about a recorded utterance inclusively the speech signal. A Document Type Definition (DTD) describes the data structure and provides with the possibilty of automatically data structure validation. That ensures right data reading by human and interoperability between the used speech processing tools. A user can browse the speech databases with a normal web browser. The browser has to support XSL transformation, ECMA script and Scalable Vector Graphics to visualize the content. If the user requests a utterance, the browser gets the requested file with all available information of the corresponding utterance. The advantage of that is that the user obtain same data as a speech processing tool when it uses the underlying file server. The navigation through different speech layers is like browsing a web page. The user clicks on a part he wants to look at and a file embedded ECMA script filters the data and modifies the screen. The script is part of the XSL transformation file. It allows also elementary editing of the utterance content like changing word boundaries by moving the corresponding boundary mark. The changes are committed to the web server, that can handle further processing like integration into a subversion system. Because the whole speech database contents are strings, a standard search engine can be used for database searching. Searching for a phoneme under special context yields to all avialable phonemes with all information of them and also the speech signal.
منابع مشابه
مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی
Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...
متن کاملUtilizing Kernel Adaptive Filters for Speech Enhancement within the ALE Framework
Performance of the linear models, widely used within the framework of adaptive line enhancement (ALE), deteriorates dramatically in the presence of non-Gaussian noises. On the other hand, adaptive implementation of nonlinear models, e.g. the Volterra filters, suffers from the severe problems of large number of parameters and slow convergence. Nonetheless, kernel methods are emerging solutions t...
متن کاملAfrican speech technology (AST) telephone speech databases: corpus design and contents
The African Speech Technology project is developing telephone speech databases for five of South Africa’s eleven official languages, i.e. South African English, Afrikaans, and three African languages, Zulu, Xhosa, and Southern Sotho. These databases will be fully transcribed – orthographically and phonetically – and will be used for the training and testing of phoneme-based, speaker-independent...
متن کاملObjective analysis of emotional speech for English and Slovenian Interface emotional speech databases
In this paper we propose a new approach for analysis of emotional speech prosody features. The aim of the analysis is definition of emotional features that characterise emotions. Analysis was performed on emotional speech databases that were recorded in the framework of the project "Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented Environments" (Interface). Th...
متن کاملA unified approach for speech synthesis and speech recognition using stochastic Markov graphs
With the progress of speech synthesis towards the development of complete TTS systems, the databases of speech synthesizers obtain more and more similarity with databases of speech recognizers. This offers new possibilities in combining systems for speech synthesis and recognition. In a new project, we are developing a speech dialogue system with the synthesis and recognition components using u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012